Skip to main content

Data Import in R

In the world of data analysis, most data is not created within R itself but comes from various data collection software, hardware, and channels such as Excel and the internet. This chapter focuses on how to import data into R to begin data analysis. Readers can either systematically go through the chapter or select topics based on their actual needs and time constraints.

Key topics include symbol-separated files, Excel files, JSON files, and R-supported data formats like RData and RDS. Other formats will be discussed in the "Common Issues and Solutions" section as supplementary content.

Symbol-Separated Files

Symbol-separated files are the most commonly used data file formats. Knowing how to import them is essential. Here, "symbol" refers to any delimiter used to separate data, commonly commas (,), and tab characters (\t), known as CSV and TSV files, respectively.

CSV

CSV files typically have the .csv extension. The extension does not affect the file's content but helps quickly identify the format and aids automatic interpretation by some software. For example, a CSV file representing student grades might look like this:

student,chinese,math,english
stu1,99,100,98
stu2,60,50,88

R has a built-in read.table() function to import various delimited files. Here’s how to use it to import the data directly from text:

stu <- read.table(text = "
student,chinese,math,english
stu1,99,100,98
stu2,60,50,88
", header = TRUE, sep = ",")
stu
class(stu)

Usually, data is stored in files on the computer. To import a CSV file using read.table():

cars <- read.table(file = "data/data-import/mtcars.csv", header = TRUE, sep = ",")

Check the first few rows:

head(cars)

Alternatively, use read.csv() for CSV files, which simplifies the process:

cars2 <- read.csv(file = "data/data-import/mtcars.csv")
head(cars2)

Efficient Data Import with readr and data.table

For larger datasets or when performance is critical, the readr and data.table packages are recommended. These packages can read large datasets more quickly than base R functions.

Using readr:

library(readr)
time2 <- system.time(
z2 <- read_csv(temp_csv)
)
time2

Using data.table:

library(data.table)
time3 <- system.time(
z3 <- fread(temp_csv)
)
time3

Check the class of imported objects:

class(z1)
class(z2)
class(z3)

TSV and Other CSV Variants

TSV files use the tab character as a delimiter and can be imported similarly by specifying sep = "\t":

mt <- read.table("data/data-import/mtcars.tsv", sep = "\t", header = TRUE)
mt

Using readr:

mt2 <- read_tsv("data/data-import/mtcars.tsv")
mt2

Using data.table:

mt3 <- fread("data/data-import/mtcars.tsv")
mt3

Excel

Excel files are widely used for data storage and processing. The readxl package can be used to import data from Excel files:

library(readxl)
mt_excel <- read_excel("data/data-import/mtcars.xlsx")
head(mt_excel)

To read from a specific sheet:

excel_sheets(excel_path)
iris <- read_excel(excel_path, sheet = "iris")
head(iris)

JSON

JSON is a lightweight data-interchange format. The jsonlite package is popular for parsing JSON in R:

jsonlite::toJSON(letters)
jsonlite::toJSON(c(a = 1L, b = 2.0))
jsonlite::toJSON(data.frame(a = 1:3, b = 2:4))
jsonlite::toJSON(list(a = 1L, b = 2:5, c = c(TRUE, FALSE), d = NULL))

Save JSON data to a file:

jsonlite::write_json(list(a = 1L, b = 2:5, c = c(TRUE, FALSE), d = NULL), path = "data/data-import/example.json")

Read JSON data:

jsonlite::read_json("data/data-import/example.json")
jsonlite::read_json("data/data-import/example.json", simplifyVector = TRUE)

R Data Files

Using R's native data storage formats, RData and RDS, is efficient and common for saving and loading R objects.

RData

RData files can save multiple objects:

save(d1, d2, file = "data/data-import/mtcars.RData")
load("data/data-import/mtcars.RData")
ls()

RDS

RDS files are for single objects and allow renaming upon loading:

saveRDS(mtcars, file = "data/data-import/mtcars.rds")
mtcars_rename <- readRDS("data/data-import/mtcars.rds")
head(mtcars_rename)

Common Issues and Solutions

Loading Data from Clipboard

data <- read.table('clipboard', header=TRUE)

Reading Line by Line

Use readLines() to read file content line by line:

fil <- tempfile(fileext = ".data")
cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = fil, sep = "\n")
readLines(fil, n = -1)
unlink(fil) # Clean up

Fixed-Width File Format

Use read.fwf() or readr::read_fwf() for fixed-width format files.